# Algorithm-Architecture Co-Optimization of Area-Efficient SDR Baseband for Highly Diversified Digital TV Standards

Kiyotaka Kobayashi, Hidekuni Yomo Communication Core Devices Development center, Tokyo R&D Center, Panasonic Corporation, Japan E-mail: {lastname.firstname}@jp.panasonic.com Min Li, Raf Appeltans, Hans Cappelle, Amir Amin, Aissa Couvreur, Matthias Hartmann, André Bourdoux, Praveen Raghavan, Antoine Dejonghe, Liesbet Van der Perre CSI Department, Imec, Belgium

E-mail: {firstname.lastname}@imec.be

Abstract—The rapidly evolving and diversifying wireless landscape demands highly flexible wireless chipsets. SDR solutions are becoming more and more appealing due to their high flexible programmability. However this flexibility comes at an expense of programmability overhead, which in turn increases the amount of required silicon area. In this work, we prove that, with algorithm and architecture co-design, SDR solutions can be very competitive even when compared to highly optimized ASICs. Specifically, this article presents a SDR baseband design that supports ISDB-T, DVB-T and ATSC without incurring additional costs in silicon area when compared to the set of ASICs required to handle separately the three standards.

#### I. Introduction

Nowadays, mobile devices are integrating an increasing variety of fast-pacing wireless communication standards, including cellular communications, wireless local, personal and metropolitan area networks, digital broadcasting and so on. In addition, each standard demands a multitude of modes. This tremendous diversity demands for highly flexible radio implementations. On the other hand, the design complexity and design cost of deep sub-micron silicon increase exponentially. Hence, to tackle the challenge of development cost, semiconductor vendors prefer highly flexible platforms that are designed once for multiple products or even multiple product lines. Motivated by the above, the Software Defined Radio (SDR) baseband has become an attractive solution for communication silicon vendors. The Tier-2 SDR paradigm [1], where the entire baseband runs on programmable or reconfigurable architectures, is a now a viable solution to obtain the aforementioned flexibility. Baseband platforms using programmable processors have attracted extensive interest in recent years.

The intrinsic efficiency of flexible solutions is inherently lower than that of dedicated solutions such as ASICs (Application Specific Integrated Circuits) implementations. Hence, when implementing a given signal processing block such as FFT (Fast Fourier Transform), the SDR baseband solution will require more silicon area, thus becoming less competitive in this respect. However, practical baseband receivers often consist of a large number of different blocks. Taking into account that hardware resources in SDR baseband can be efficiently multiplexed by many different signal processing blocks, the overall area efficiency of a well-

optimized SDR baseband should be competitive, possibly higher than ASIC. However, to the best of our knowledge, such competitiveness has not been proved yet. In most previous literatures, SDR baseband solutions were reported to have less competitive area efficiency when compared to ASIC counterparts [2]. In fact, area efficiency is one of the most important factors that determine the final cost of commercial chipsets. Hence, achieving competitive area efficiency is crucial for SDR baseband.

Motivated by the above, work was conducted towards the optimization of SDR baseband solutions. This work led to results that prove the competitiveness of these solutions also in terms of silicon area requirements. We mostly consider multistandard terrestrial digital TV receiver solutions. To make a fair comparison, we focus on three main-stream standards that are very mature: ISDB-T, DVB-T and ATSC. These three mature standards have achieved strong market establishment, so that highly optimized ASIC designs are readily available to compare. Although we focus on terrestrial digital TV, other emerging wireless communication standards are considered and studied. Specifically, performance of IEEE 802.11n and 3GPP LTE-A (Advanced) inner receivers is also studied in this work. On state of the art SDR baseband platforms, the inner receiver (for synchronization and data detection) and outer receiver (for forward error correction) are normally implemented on different ASIP (Application Specific Instruction Processors) [2]. This paper focuses on the inner part of the receiver, which has fully-fledged signal processing functionalities.

In this work, we took a co-design approach that tightly couples the algorithm and architecture design. Such holistic optimization approach ensures that algorithm and architecture fits each other, so that the hardware can be efficiently utilized. It eventually translates in very high area efficiency. Specifically, with such optimizations, our design requires comparable area to the ASIC counterparts. When normalized toward 90nm technology, the ASIC counterparts requires 5.11 mm² in total (ATSC equalizer ASIC reported in [3] requires 4.05 mm² and the DVB-T receiver ASIC reported in [4] requires 1.06 mm², the ISDB-T receiver shares the circuit of DVB-T), whereas our SDR design requires only 5.94 mm² to implement the same functionality. Hence, our work proves the area competitiveness and flexibility of SDR baseband solutions.



Figure 1. The Algorithm/Architecture Co-Design Flow with Joint Optimizations [10]

The rest of this paper consists of the following parts: section II briefs the background, section III introduces the algorithm and architecture co-design flow, section IV gives an overview for algorithms and architectures in this work, section V presents results, section VI concludes the paper.

#### II. BACKGROUND

# A. Digital TV Standards and Each Receiver algorithm

Global terrestrial digital TV standards are summarized in TABLE I. TABLE I highlights only the three most widely used standards, which have been globally adopted. ISDB-T is very similar to DVB-T although ATSC is quite different from the others.

The considered receiver is depicted in Fig. 2. Received signals pass through the analog front-end, are then analog-todigital converted, followed by time synchronization. ISDB-T (also DVB-T) receiver performs clock frequency and narrow-(within sub-carrier interval) carrier synchronization. FFTs are performed to move to the frequency domain, and the results are utilized for wide-band (beyond subcarrier interval) carrier frequency synchronization. Channel coefficients are estimated based on the pilot symbols. The channel estimates are forwarded to the detector together with the received data signals. Finally, the detection is performed. Details of the algorithm can be found in [4][5]. On the other hand, ATSC receiver applies Decision feedback equalizers (DFE) together with adaptive algorithm so as to reduce the variance of the channel estimation error. Although many high performance algorithms exist, the least-mean square (LMS) algorithm is often used in practice for complexity reasons, in particular, when the filter lengths are long [6]. Decision feedback equalizers consisting of a feedforward filter (FFF) and a feedback filter (FBF) are preferred to linear equalizers because they are more effective at reducing the inter-symbolinterference (ISI) mainly because they are able to cancel very efficiently the post-cursor portion of the ISI. A detailed treatment of the DFE is provided in [7], showing how to optimally compute the weight coefficients, provided that the channel impulse response is known at the receiver. When this is not the case and the transmission extends over a large number of symbols (as in broadcasting systems), an adaptive DFE provides the means to compute adaptively the filter coefficients without any prior knowledge about the channel. Details of the algorithm can be found in [3].

## B. Logic Area Estimation of Multi-standard ASIC

As discussed in [2], it is not possible to perform a fair comparison of the various solutions unless a benchmark for multi-standard operation of Digital TV is defined. The data memory area in ASIC and SDR baseband can be regarded the same because no fundamental difference of usage appears between them (practically, since the OFDM dominates the data memory area, the ATSC shares the part of it.). We therefore have estimated the logic area of multi-standard ASIC based on the development for each standard [3][4][5]. As mentioned in section I, the part of inner receiver (named as Demodulation in fig. 2) is considered. The area is normalized to 90 nm process unless another process was applied.

ISDB-T can share the digital baseband components with DVB-T because of the feature similarity summarized in TABLE I. Note that this is unfavorable assumption for SDR baseband because practically some percentage would be increased when implement the both standard into one ASIC. With regard to memory area, the peripheral circuit overhead would be usually considered. So a pure linear scaling is not the best way. But, based on a published reference [8], we would use it to have an estimated area of the memory. The area of DVB-T is estimated under the assumption that the part of modulation occupies 60% of the whole of digital baseband [4], resulting in 1.06mm<sup>2</sup>. The area of ATSC is estimated under the assumption that the part of modulation occupies 40% of the whole of digital baseband [3]. The latest ATSC Receiver Performance Guidelines [9] suggest that the channel impulse response for a signal from a single DTV transmitter can be expected often to range from -40 µs (pre-echo) to +50 µs (postecho), possibly +/-70 µs. We have practically considered from -40 μs to +60 μs, results in 1076 taps required. Therefore, the area of ATSC is multiplied by coefficient 2.10 (=1076/512 [3]), which results in 4.05mm<sup>2</sup>. Complete details are provided in TABLE II. We have estimated the area of the three multistandard ASIC as 5.11mm<sup>2</sup>.

TABLE I. FEATURES OF GLOBAL TERRESTRIAL DTV STANDARDS

|                           | ISDB-T                        | DVB-T                   | ATSC                    |
|---------------------------|-------------------------------|-------------------------|-------------------------|
| Adopted<br>Country/Area   | Japan,<br>South America       | Europe                  | North America,<br>Korea |
| Channel<br>Band Width     | 6MHz                          | 6/7/8MHz                | 6MHz                    |
| Transmission              | Multi-carrier<br>(OFDM)       | Multi-carrier<br>(OFDM) | Single-carrier          |
| the number of subcarriers | 1405, 2809, 5617              | 1705, 6817              | 1                       |
| FFT points<br>(K=1024)    | 2K, 4K, 8K<br>(=mode 1, 2, 3) | 2K, 8K                  | -                       |
| Modulation                | QPSK,<br>16/64QAM             | QPSK,<br>16/64QAM       | 8VSB                    |

TABLE II. ESTIMATED LOGIC AREA OF MULTI-STANDARD ASIC

| ISDB-T[5]                       | 0mm <sup>2</sup> (regarded as the same as DVB-T)                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
|---------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| DVB-T[4]                        | Gates (including memory) = 1400k gates = 15.6mm² @ 180nm Estimation of memory bits = {8192×2×1.25(sync buffer with 1/4GI) +8192 (demod) + 6817(num. of subcarrier for channel estimation)} x 12bit x 2(I,Q) + 6817x 6bit(64QAM) x 4bit(LLR*) = 1015k bits Data memory area = 1015k / 477k [8] = 2.13mm² @ 90nm (52M bit/109mm² = 477k bit/mm²) Logic area (normalized to 90nm): = 15.6 mm² x (90nm/180nm)² -2.13mm² = 1.77mm² Under the assumption of Demodulation ratio = 60% = 1.77 x 0.6 = 1.06mm² * Log-Likelihood Ratio |  |  |
| ATSC[3]                         | Gates(not including memory) = 2.9M Tr / 4Tr(2-input NAND gate)=725kgates EQ block assumes 40% occupation of gates = 725k x 0.4 = 290kgates = 1.93mm² (under the estimation of gates number per mm² 150k gates/mm² @ 90nm) The total required taps covers from -40us to +60us = 1076 (= 10.76M sample/sec x 100 us) Logic area = 1.93 x (1076/512) = 4.05mm²                                                                                                                                                                  |  |  |
| Area of multi-<br>standard ASIC | $0 + 1.06 + 4.05 = 5.11 \text{mm}^2$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |  |  |

# III. OVERVIEW OF THE CO-DESIGN FLOW

When designing ASIC baseband, the hardware can be maximally customized for targeted algorithms such as FFTs. However, with SDR the baseband processor has to fit many different algorithms. In addition, although baseband processors can be customized as well, the processor architecture and the associated compilers often have many fundamental constraints that severely constrain the design. Hence, the design flow is more challenging when comparing to that of the ASIC baseband. The design flow that we have applied, is depicted in Fig.1 [10], being split into 2 parts: the algorithm/software side and the architecture/hardware (for the SDR baseband processor) side. As opposed to the traditional design paradigm, the 2 parts are jointly optimized to enable highly efficient baseband signal processing.



FE: analog frond end, A/D: analog to digital converter, N(W)SYNC: narrow(wide)-band frequency synchronization, FFT: fast Fourier transform, CHE: channel estimation, FF(B)F: feed forward(back) filter, DET: detection, CoUP: coefficient update, FEC: forward error correction MPEG: Motion Picture Experts Group

Figure 2. Diagram of ISDB-T, DVB-T and ATSC Receiver

## A. The Algorithm/Software Side

On the algorithm/software side, the design flow starts from signal processing algorithms design that gives mostly high level floating point Matlab code. In the high level Matlab code, algorithmic details are not fully specified. For instance, how FFTs and filters are performed is not specified. The purpose of this step is building and verifying the signal processing functionality. Then, the high level Matlab code is refined to intermediate level code by adding full algorithmic details and algorithmic transformations. For instance, a FFT may be expanded to a Radix-2 based algorithm. From this step, the architecture/hardware side starts to influence the algorithm. After this step, quantization is performed to convert all signals to fixed point. On the quantized Matlab code, further algorithmic transformation is performed. In this step, we optimize not only computation complexity, but also data access behavior, address generations and compiler friendliness. Once this is finished, C code generation and rewriting can start, which provides input for compilation and mapping on the targeted SDR baseband processor.

#### B. The Architecture/Hardware Side

In our work, we use extensible baseband processor architecture template, which allows to instantiate many different processor architectures corresponding to different requirements [11]. Hence, on the architecture/hardware side, the starting point is the architecture feature selection and exploration, which gives the architecture template with kev features. Moving toward specialized processors, the next step is the functional definition for accelerations, which means identifying important signal processing blocks to be accelerated on the targeted processor, for instance, channel estimation (CHE) of ISDB-T in Fig. 2. However, in this step, only the functional aspect is identified, implementation details are not specified yet. The next step is then specifying how to realize the acceleration with instructions or accelerators. This step produces detailed I/O definitions for special instructions or accelerators. For instance, CHE may require an insert intrinsic so as to copy a single 32 bit complex number (16 bit I-value and 16 bit Q-value) to all the elements of the vector for the



Figure 3. Illustrative Example for The ADRES Template [10]

purpose of enabling a better utilization of SIMD instructions. Based on the above I/O definitions, bit level algorithms are designed and hardware is implemented in the next step for desired instructions/accelerators.

#### C. Important Co-Optimizations

Importantly, the aforementioned 2 sides are closed synchronized. First of all, algorithmic structures are made compatible with important architecture/compiler features and constraints. Detailed examples for this can be found in [12][13]. In addition, the numerical stability aspect is carefully verified together with the architecture customization. Note that most parallel baseband processors incur substantial overhead when handling heterogeneous or very large word-lengths. Highly controlled word-length is crucial. Furthermore, when refining algorithms to lower levels, sharing special instructions and accelerators among different signal processing blocks is made as an important criterion. As multi-standard DTV includes not only intuitively parallel-friendly multi-carrier systems (ISDB-T, DVB-T), but also single-carrier system (ATSC), this requires a careful co-exploration of algorithms and instruction/accelerator definition, which derives algorithms that are based on very similar primitives. Moreover, bit-level algorithms instruction/accelerator are co-verified with signal processing functionalities. Importantly, the cost of instruction/accelerator is minimized whereas required accuracy is met. The aforementioned aspects are crucial for the co-optimization. Beyond the above, there are other important aspects as well, such as the memory constrained algorithm refinement and efficient address generation scheme design.

# IV. OPTIMIZATION OF ALGORITHM AND ARCHITECTURE FOR GLOBAL TERRESTRIAL DTV STANDARDS

# A. Architecture Template and Customization

Our work is based on the ADRES template [10]. A simple illustrative example is shown in Fig.3. The parameterizable template consists of an array of densely interconnected Function Units (FUs) that have local Register Files (RFs) and configuration memory for them. A limited subset of those FUs is connected to a global RF, enabling their operation also as a standard Very Long Instruction Word (VLIW) processor. The array part can be configured as Coarse Grain Array (CGA) mode, which allows to execute a large amount of operations in parallel. A retargetable C compiler, named as DRESC, targets both the VLIW and CGA modes. With the ADRES template,

TABLE III. HIGH LEVEL FEATURE OF PROCESSOR ARCHITECTURE

| Scalar<br>processing        | 30 FUs supporting scalar operations, used for address generations, loop control, etc.                                                                                                                                                                               |  |
|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Vector<br>processing        | each vector FU support supporting 256-bit Single Instruction Multiple Data (SIMD), which is 16 real SIMD slots of 16-bit each, or 8 complex SIMD slots of 32-bit each - 4 vector FUs supporting multiply-and-accumulation - 8 shuffle FUs supporting shift per slot |  |
| Vector memory access        | 6 FUs that are connected to both 4 vector memories and 4 vector FUs                                                                                                                                                                                                 |  |
| Vector/Scalar communication | 6 packing/unpacking FUs connecting vector FUs and scalar FUs                                                                                                                                                                                                        |  |

we can design a baseband processor with massive parallelism by combining ILP, DLP and extended custom instructions. Key high level features are summarized in TABLE III. Given that cost is the top priority for consumer devices, the amount of parallelism of the baseband processor is not made larger than pervious state of the art baseband processors. Our custom baseband processor performs 64 16-bit operations in parallel on the vector processing part, which are fewer than those on the NXP EVP [14] and SODA [15]. Detailed is described in [10].

In this work, the customized processor instance is dimensioned for not only multi-carrier-based ISDB-T and DVB-T, but also for single-carrier-based ATSC. An example of the co-optimization is described in Fig. 4 and 5. The input of IIR (Infinite Impulse Response) filter is the sequentially detected data based on its own feedback, so only one 16-bit data are ready for filtering per symbol. In order to make the best of 256-bit SIMD, a 16-tap FIR (Finite Impulse Response) filter and 16-delay are introduced into the original algorithm. The 16-delay works as a 256-bit buffer. The architecture side corresponds to the updated algorithm, which introduces a new intrinsic, shuffle. As the LMS-DFE algorithm multiplies an array of data in FFF and FBF by a scalar data (error) symbol by symbol, the shuffle supports data shift per slot between 2 vectors. Based on the two updates in the above, the loop structure is composed so as to operate efficiently. As described in fig. 5, the coefficient update of FFF and FBF is done in every 2 symbols. Note that from the bit error rate perspective, we have confirmed that no explicit degradation occurs under the channel condition of ATSC standard [9]. The 30 FUs of scalar processing and 8 shuffle FUs of vector processing represent the result of optimization.



Figure 4. Algorithm Optimization for Efficient Use of 256-bit SIMD



Figure 5. Architecture Optimization (shuffle instruction)

#### V. EXPERIMENTAL RESULTS

# A. ADRES Cycle Counts and Estimated Area

The cycle counts for ATSC and ISDB-T are summarized in TABLE IV. The ISDB-T processing is required every 1.125ms including Guard Interval (GI), takes 168.3k cycles to complete. This corresponds to only 38% of cycle budget with 400MHz clock frequency. Note that it is practically feasible clock frequency because even the previous generation of ADRES based SDR baseband processor with 90nm process has the 400MHz clock frequency in worse case [1]. The ATSC processing, which is required every 0.093us, takes 36 cycles to complete. This corresponds to 97% of cycle budget with 400MHz clock frequency. When considering the ASIC area ratio about 3.82 (= 4.05mm²/1.06mm²), the cycle ratio 2.56(= 97%/38%) is feasible. Note that the estimated cycle counts for DVB-T would be the same as the ones for ISDB-T.

In order to estimate the ADRES area summarized also in TABLE IV., physical synthesis has been performed on the Functional Units, Register Files. Area for the configuration memory has also been taken into account to do a fair comparison with respect to the multi-standard ASIC. A routing factor of 1.66 has been incorporated to ensure that the final area estimate takes into account the cell density after place and route. This routing factor is based on the previous generation of ADRES [1]. The estimated area, therefore, can be regarded as after place and route. The total estimated area occupies 5.94mm<sup>2</sup> which corresponds to 1.16 times compared to the multi-standard ASIC in section II.B. We can conclude the software defined multi-standard receiver has competitiveness embedded with the intuitive flexibility.

Furthermore, in order to investigate how we can reuse the same hardware for other standards, we performed cycle count estimations based on available experiments results [10]. We have used fully-fledged receivers for IEEE 802.11n and Cat-E LTE-A on a similar processor architecture that has a different parameterization. Based on architecture parameters and measured cycle count results, we extrapolate the potential performance that we can achieve with the processor introduced in this paper. For wireless LAN, the payload processing in 4x4 40MHz 802.11n receiver, for Cat-E LTE-A (2 streams of 20MHz 2x2 MIMO), the user data processing is considered. The introduced baseband processor has been extrapolated to be able to handle the two cases within the cycle budget. Hence, we can conclude that there is a promising potential for reusing introduced baseband processor.

TABLE IV. ADRES CYCLE COUNTS AND ESTIMATED AREA

|                                                                                                                | ATSC                                                                                                               | ISDB-T                       |
|----------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|------------------------------|
| Symbol duration                                                                                                | 0.093us                                                                                                            | 1.125ms (1/8GI)              |
| operating frequency                                                                                            | 400MHz                                                                                                             | 400MHz                       |
| Cycle budget (real-time operation)                                                                             | 37 (=400MHz x<br>0.093us)                                                                                          | 450k (=400MHz x<br>1.125m)   |
| Actual cycles                                                                                                  | 36 (97% of cycle budget)                                                                                           | 168.3k (38% of cycle budget) |
| Estimated Area of ADRES<br>(configuration memory included,<br>normalized to 90nm,<br>physical synthesis level) | 5.94mm <sup>2</sup><br>(1.16 times(=5.94mm <sup>2</sup> /5.11mm <sup>2</sup> )<br>compared to multi-standard ASIC) |                              |

#### VI. CONCLUSION

In this paper, we have presented the overview of the algorithm-architecture co-optimization for SDR based multistandard DTV receiver. We have shown that SDR baseband can indeed achieve competitive area efficiency. When considering ISDB-T, DVB-T and ATSC, the presented SDR baseband requires competitive area when compared to highly-optimized ASIC counterparts. Hence, when considering more standards, the area cost of SDR baseband is intuitively more competitive. This clearly proves the advantage of SDR baseband for consumer electronics.

#### REFERENCES

- [1] http://www.wirelessinnovation.org/ (http://www.sdrforum.org)
- [2] Ramacher, U., "Software-Defined Radio Prospects for Multistandard Mobile Phones," Computer , vol. 40, no. 10, pp. 62-69, Oct. 2007
- [3] N. Tokunaga et al., "Development of VSB demodulator LSI with highperformance waveform equalizer," IEEE International Conference on Consumer Electronics (ICCE) 2000, pp. 42-43, June 2000
- [4] Kai-Yuan Jheng, et al., "A DVB-T baseband demodulator design based on multimode silicon IPs," IEEE VLSI-TSA-DAT 2005, pp.49–52
- [5] K. Hayashi et al., "Development of Key Technologies for OFDM Receiver: Application for Digital Terrestrial Television Broadcasting," ITE Technical Report Vol.23, No.28, pp. 25-30, Mar. 1999
- [6] J. Proakis, C. Rader, F. Ling, M. Moonen, K. Proudler, and C. Nikias, Algorithms for Statistical Signal Processing. Prentice Hall, 2002.
- [7] N. Al-Dhahir and J. Cioffi, "Mmse Decision Feedback Equalizers: Finite-length Results," IEEE Transactions on Information Theory, vol. 41, no. 4, pp. 961–975, July 1995.
- [8] Thompson, S. et al., "A 90 nm logic technology featuring 50 nm strained silicon channel transistors, 7 layers of Cu interconnects, low k ILD, and 1 μm² SRAM cell", IEEE Electron Devices Meeting (IEDM), 2002, pp.61-64, Dec. 2002
- [9] ATSC Standard A/74:2010, ATSC Recommended Practice, Table 5.6
- [10] M. Li et al., "Overview of A Software Defined Downlink Inner Receiver for Category-E LTE-Advanced UE," IEEE International Conference on Communications (ICC) 2011, pp.1–5, June 2011
- [11] B. Mei et al., "Architecture exploration for a reconfigurable architecture template," *IEEE Des. Test*, vol. 22, no. 2, pp. 90–101, 2005.
- [12] M. Li et al., IEEE ICC 2008, pp. 737–741, May. 2008.
- [13] M. Li et al., Signal Processing, IEEE Transactions on, vol. 57, no. 4, pp. 1604–1615, April 2009
- [14] K. van Berkel et al., "Vector processing as an enabler for software-defined radio in handheld devices," EURASIP J. Appl. Signal Process., vol. 2005, no. 1, pp. 2613–2625, 2005.
- [15] Y. Li et al., "Soda: A high-performance dsp architecture for software-defined radio," IEEE Micro, vol. 27, no. 1, pp. 114–123, 2007.
- [16] B. Bougard et al., "A coarse-grained array based baseband processor for 100mbps+ software defined radio," Design, Automation and Test in Europe (DATE) 2008, pp. 716–721, 10-14 March 2008.